On Power-Law Distributed Balls in Bins and Its Applications to View Size Estimation

نویسندگان

  • Ioannis Atsonios
  • Olivier Beaumont
  • Nicolas Hanusse
  • Yusik Kim
چکیده

The view size estimation plays an important role in query optimization. It has been observed that many data follow a power law distribution. In this paper, we consider the balls in bins problem where we place balls into N bins when the bin selection probabilities follow a power law distribution. As a generalization to the coupon collector’s problem, we address the problem of determining the expected number of balls that need to be thrown in order to have at least one ball in each of the N bins. We prove that Θ( α lnN cα N ) balls are needed to achieve this where α is the parameter of the power law distribution and cN = α−1 α−Nα−1 for α 6= 1 and cN = 1 lnN for α = 1. Next, when fixing the number of balls that are thrown to T , we provide closed form upper and lower bounds on the expected number of bins that have at least one occupant. For n large and α > 1, we prove that our bounds are tight up to a constant factor of ( α α−1 )1− 1 α ≤ e ≃ 1.4.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The onset of dominance in balls-in-bins processes with feedback

Consider a balls-in-bins process in which each new ball goes into a given bin with probability proportional to f(n), where n is the number of balls currently in the bin and f is a fixed positive function. It is known that these so-called balls-in-bins processes with feedback have a monopolistic regime: if f(x) = x for p > 1, then there is a finite time after which one of the bins will receive a...

متن کامل

Optimizing the Characteristics of the Motion of Steel Balls and their Impact on Shell Liners in SAG Mills

The equations governing the motion of steel balls and their impact onto shell liners in industrial Semi-Autogenous Grinding (SAG) mills are derived in full details by the authors and are used in order to determine the effective design variables for optimizing the working conditions of the mill and to avoid severe impacts which lead to the breakage of SAG mill shell liners. These design vari...

متن کامل

Multiple-Choice Balanced Allocation in (Almost) Parallel

We consider the problem of resource allocation in a parallel environment where new incoming resources are arriving online in groups or batches. We study this scenario in an abstract framework of allocating balls into bins. We revisit the allocation algorithm GREEDY[2] due to Azar, Broder, Karlin, and Upfal (SIAM J. Comput. 1999), in which, for sequentially arriving balls, each ball chooses two ...

متن کامل

Expected number of uniformly distributed balls in a most loaded bin using placement with simple linear functions

We estimate the size of a most loaded bin in the setting when the balls are placed into the bins using a random linear function in a finite field. The balls are chosen from a transformed interval. We show that in this setting the expected load of the most loaded bins is constant. This is an interesting fact because using fully random hash functions with the same class of input sets leads to an ...

متن کامل

Extraction Kinetics and Physicochemical Studies of Terminalia catappa L Kernel Oil Utilization Potential

Kinetics and selected variables (temperature, particle size and time) for extraction of Terminalia Catappa L Kernel Oil (TCKO) were investigated using solvent extraction. Kinetic models studied were: parabolic diffusion, power law, hyperbolic, Elovich and pseudo-second-order. In ascending order, the best-fitted models at the optimum temperature and oil yield were Elovich’s model, hyperbolic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011